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A. INTRODUCTION 

_ This paper briofly reviews the infarma- 
ticn explosion of the last thirty years 
and the various attempts made to organizG 
that inCormauion in new ways. Section B 
offers a brief historic review of modern 
classification and subject heading theory. 

Section C reviews the literature of 
automatic indaxing, automatic abstracting, 
and automatic classification. The prob- 
lems of large file organization, word 
meanings^ and the limitation of such 
"automatic" methods are discussed. 

^ Section D sums np the state of the art 
m automatic indexing by concluding that 
human intellectual effort is stillre- 
quired in indexing. The computer is 
viewed as a valuable assistant in that 
intellactual effort and the wide variety 
of computer applications to indexing work 
IB summarized. 



B, THE 'TNFORMATION EXPLOSION^^ AND 
ATTEMPTS TO MEET rr 

"The cost of manual classification and 
abstracting of all the articles in the 
world's hundred- thousand tachnleal 
periodicals would be fantastic. The 
practicality of carrying it out in a 
coordinated timely way by manual 
methods is unrealizable. There is also 
a pressing need to extend the coverage 



of a myriad of unpublished workinq 
papers. Ilenco, there is an utter 
nocessity for automatic indexinq, 
abstracting and suminaries made by 
electronic data processinq J' ^ 

The "information explosion," alluded to 
above, and the growing difficultv experi-^ 
enced in handling that vQlume of informa- 
tion in an efficient way to serve particu-^ 
lar users have comlDined to create an 
increased interest in unconventional, 
Often machine-aided, information retrieval 
systems. Lilley has described the radi- 
cal changes in information flow during the 
past fifty years and related these changes 
to the rise of nonhierarchical classific/-^ 
tion ideas and new subject analysis tech= 
niques-, . 

Bourne illustrates the magnitude /of the 
paper problem with these examples; ; (1) 
The federal government produces twenty- 
five billion pieces of paper a year. There 
IS now enough research to fill 7.5 Penta- 
gons with file cabinets at a cost of over 
four billion dollars a year. (2) Military 
engineering drawings and documents cost 
two billion dollars a year and yield six 
million drawings which must be added to" 
the fifty million alr^dy on file (3) 
There are 30,000 technical journai publi- 
cations with over two million articles 
annually/, ^ Obviously all of these pieces 
of paper do not contain new information 
as anyone who currently reads the jour- 
nals, abstracts and microfilms of a field 
will quickly discover,^ 



^B* F. Cheydleur. "Information Retrieval - = 
(October, 1961), pp. 21^25. Quote is from p. 21. 



- 1966." DATAMATION, 7:10, 



^Jesse H. Shera. ^'TrendE in Subject Analysis Practice- 1950 to Todav 

univaraity School of Library Sarvice) 19S9 oo' I4ff t- C Thesi a, Columbia 
o„.uty Conc.pt in Subject L,l>„L"i' Yoll' .l%.]'"oiL SS/«a SJS;-., S°«or- 

1953), p. gives a aimilar assaaamont of thaaa changaa . - -aDorator..ea , 



"Charles P. Bourne, ad. methods OP INFOWiATlON handling, wiiev 1963 

!;!"SL??..P«?i2'^"«l^"_^n. dispute; K. PV B«r in ti» article, 
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"The"E8timate of the Number' of Currentlv Availabla Soiantif irand'Technioal 
Periortioal." Journal op documlntatiom, 23i2 (Juna 1967) dd iin i?fi i 

„H= T strasaed by JaBaa H. Shera in his "Tha Sociological Relation-hio 

(Henceforth JASIS , 22i2, (March-April, 1971), op. 7S-B0,. "...but it is neither ' 

SLer^iSfr?/?^ " « P*P« explosion., it is not ^nowLSgi SSt 

g?ff , *3 " increasing at an exponential rate." t,. 78. See also Y Bar 

u 'aklu ppfls-SS?" ^PP-aching a Crisis?" imiCM^olmlmATmn . 



As new Information techniques arose to 
meet new Information demands, new types of 
alphabetically arranged indexing ajds such 
as thesauri were created,- Computer .^oft- 
ware programi were developed to supplement 
human efforts in information retrieval * 
These developments once again focused 
attention on the lack of a consistent/ 
comprahenaive code for subject heading 
creation and ordering. 

Harria has considered the computer 
implications of rigorous definition in 
subject analysis. She found that the 
theory of subject headings had not greatly 
advanced yince Cutter. ^ Pollard and 
Bradford pointed □ut over forty years ago 
that alrhabetical subject indexes created 
problems because of their hidden / often 
unacknowledged, claseif ication schemes,* 
Richmond investigated the concealed 
classification in the Library of Conqress 
Subject Heading List under "cats." She 
found that much^ inconsis tency seemed to be 
related to an unawareness of a classifi^ 
catory scheme in the listing,^ Daily's 
OKhaustive study of the same subjeet head- 
ing list showed thit once the words 'of a 
subject heading were chosen, the form of 
expression of that heading was largely de^ 
termined by the need to fit the heading 
form into the existing structure of head- 
ings* Frarey found that the emphasis 



during the 19 4 0s shifted from ^'catalogQr ' s 
choice" in headings to the use of common 
terms in the Library of Congress list* 
This shift caused an increasing complexit;:' 
in the form of the subject heading list.-^ 

Since alphabetical lists with subdivi- 
sions are often based on some type of 
classification / several authors have pro- 
posed that subarrangement or ordering of 
heading words or elements be made explicit 
by designating aspects formally . Kaiser 
developed a scheme for entry subdivision 
based on "concx'ete" and ^'process " aspects 
of a subject, ^ ^ Prevost advocated a "noun 
rule" for the ordjr of words W'ithin a 
heading.'- Metc^if proposed that cpc^'i- 
ficity of subject be absolute with rela^ 
t.ion to the object of study, and that 
specification of the aspects from which 
that object is studied be limited by the 
topic and the needs of the particular 
lii^rary . - ^ 

Modern exposition of "facet analysis'' 
was begun by S*R* Ranganathan who applied, 
his now famous princip.les of "Personality / " 
"Matter," "Energy," "Space," and "Time" to 
indexable matter in order to create index- 
ing access. He then developed a notational 
system utilizing the "colon" for linking 
these facets to one another. - ^ 



-Arthur L. Korotkin, Lawrence H, Oliver, and R. Burgis, INDEXING AIDS, 
PROCEDURES AND DEVICES, Report NO. RADC-TR-64-582 , (Bethesda, Maryland: General 
Electric Company, Information Systems Operation, April, 1965), AD 616 342, 110 p.; 
and Charles Bernier, "Indexing and Thesauri." SPECIAL LIBRARIlS, 59 (February, 
1968) , pp^ 98-^103. Both publications survey developments and use of indexing 
aids * 

^Jessica Harris. SUBJECT ANALYSIS; COMPUTER IMPLICATIONS OF RIGOROUS 
DEFINTTIONS. Scarecrow, (1970), 279 p.] see especially page 13. 

•a. F. C. Pollard and S. Bradford. "The inadequacy of Alphabetical 
Subject Index." ASLIB PHOCIEDINGS OF THE SEVENTH INTERNATiONAL CONFERENCE. 
(1930) , pp, 39-45, 

^Phyllis A. Richmond. "Cats ! An Example of Concealed Classification in 
Subject Headings," LIBRARY RESOURCES AND TECHNICAL SERVICES, 3?2, (Spring, 
1959) , pp. 102-^112, 

-*^J. E. Daily, "The Grammer of Subject Headings." (D. L, S. Thesis, 
Columbia University School of Library Service, 1957) p. (unpublished) , 222 p. 

^^Carlyle J. Frarey. "Subject Heading Revision by the Library of Cangress," 
(Masters Paper, Columbia University School of Library Service, 1951), 97 p * ; and 
his "A History of Subject Cataloging Principles and Practices in the United 
States, 1850-1954." (D, L, study in progress, Columbia University School of 
Library Service) . 

^^J. Kaiser. SYSTEMATIC INDEXING, Pitman, (1911). 

* -Marie Louise Prevost. "Approach to Theory and Method in General Subject 
Headings," LIBRARY QUARTERLY, 16:2 (April, 1946), pp, 140-151, 

^^John W. Metcalf. INFORMATION INDEXING AND SUBJECT CATALOGUING. Scarecrow, 
(1957) , 338 p. 

*'See his PROLEGOMENA TO LIBRARY CllASSlPICATION . Madras Library Aisociation, 
(1937), Second Edition, Library Association, UK, (1957), 487 p*i and his "Subject 
Headings and Facet Analysis." JOURNAL OP DOCUMENTATION, 20j3 (September, 1964), 
Q pp. 109-119. 
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Later dGveloprnGnts of the Colon classi- 
fication and facet analysis indicate that 
application of facot principles Is no eaoy 
matter^ but mu^t be carefully guidod by a 
complex set of rulos. The British Classi- 
fication Reioarch Group has been the major 
champion of facet classification research 
appUcations in England, 



Farradane has stressed the primacy of 
relationships between concepts and de- 
veloped a system of ralational -'operatcrs 
used to connect concepts into sets called 
"analets . ^ ^ coates developed a formula 
for dealing with the multiple reJation-^ 
ships batv/ean the various parts of a head- 
ing and for determining the order or words 
or elements in a heading. Schemes such 
as those discussed above have not been ex- 
tensively used in the United States. In 
1960, Tauber and Lilley developed a 
faceted classification and index for a 
proposed Educational Media Research Infor- 
mation Service.-- More recently Barhydt 
and Schmidt produced a thesaurus based cn 
an analysis of facets and intc^nded for 
use within the ERIC system/^ ° 



C. AUTOMATIC ABSTIIACTING, INDEXING, 
AND CLASSIFICA TION 

while nonhierarchical classif ication 
schemes and theories were being developed, 
other researchers sought to automata the 



process of subject heading selection and 
ordering, closely related research was 
undertaken in the areas of creating ab- 
stracts (or ''extractB") automaticailv and 
classifying documdnts through automatic 
processes. 



Prywes and Litofsky point out that the 
main justifications for all automatic 
processing of information are, "cost, 
personnel availability and service quality 
problMs. ^ In 1967, Borko reviev/ed the' 
types of automatic indeHing and classifi- 
cation developed up to that time. He 
noted that tha major difficulty was not in 
the counting and correlacing of the 
characteristics of written language, but 
rather the problem was centered on what 
measures cr counts are to be used in 
selecting actual terms for indexing. A 
basic assumption of all automatic indexing 
which utiliges counting or statistical 
methods is that, excluding function words 
like articles, conjunctions, and praposi- 
tione, "the more frequently a word is used 
in a document the more likely it is a sig- 
nificant indj cater of subjec^b matter. "^^ 



Doyle summarised developmenta in auto- 
matic classification? he observed that the 
increasing size of information stores on 
magnetic tap© made some kind of organiza- 
tion of information essential in order to 
avoid costly sequential searches. ^- 
stevens has produced an updating of her 
1965 state of the art report on automatic 
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_ Classification Research Gror . **Need for a Faceted Classification as a 
Bssis for all Methods of Information Retrieval." LlBr^Ry ASSOCIATION REroRD 
57:7 (July, 1955), pp. 262"268? sei- also "CRG Bulletin #g.'* JOURNAL OF DOCO" 
MENTATION, 12.4 (December, 1956), pp. 227-230 and 273-298. 

.^rr^I?^ L. Farradane. '*A Scientific Theory of Classification and IndeKina. 
JOUWAL OF DOCUMINTATION, 6i2 (June, 1950), pp. 83-99. " -uex^^ng. 

^^E. J, Coates. SUBJECT CATALOGUES^ HEADINGS AND STRUCTURE, Library 
Association, UK, (19S0) , 186 p. 
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_ -Maurice F. Tauber and Oliver L. Llllsy. FEASIBILITY ST:jDY REGARDING THE 
ESTABLISHiMENT OP AN EDUCATIONAL MEDIA RESEARCH INFORMATION SERVICE Columbia 
University School of Library Service, (1960), 235 p. 

^^ft ^n^Sfm?^' Barhydt and Charles T. Schmidt. INFORMATION retrieval thesaurus 
FOR EDUCATION. Case Western Reserve University, (1970), 131 p. " " -^uku_ 

^^Noah S. Prywes and Barry Litofsky. "All-^Automatie Processing for a Laroe 
^taSnf^^ SPRING JOINT GOMPUTlR CONFERENCE, 36, May 5^7," 1970. AFIPS Press 
(1970), pp. 323-333; quote is from page 323| John O'Conner itressta the same 
tactors m his article, "Some Remarks on Mechanised Indexing, and Som^ Small 
Scale Results." MACHINE INDEXINGi PROGRESS AND PROBLEMS papers at the third 
^"stitute on Information Storage and Retrieval February 13-17, " 
1961, American University, (1962), 354 pp , 26G-279, 

^ ^Harold Borko. "IndeKing and Classification," in AUTOMATED LANGUAGE 
PROCESSING, edited by H. Borko. Wiley, (1967), pp, 99-125| quote is from p 
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Lauren B. Doyle, "Is AutomatiG Classification a Reaionable Application of 
the Statistical Analysis of Text?" JOURNAL OP THE ASiOClATlON OP COMPUTING 
Q MACHlNfiRY, {Henceforth JACM) , 12 i 4 (October, 1965), pp. 473-489. 
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indeKincj.""' The oriainal report main- 
tfiined that automatic indeKing could in* 
crease the speod of indexintj, increase 
the eage and spSGd of reindfjMing and re = 
classification, and increns© the con^ 
sistency of indeKi nq of forts through con- 
His tent hia chine processes. Ths second 
repcrt giv^es SKamples of small lamp la 
dernona trations which have shown the tech- 
nical feasibility of statistical associ-^ 
ation methods for indexing » Such methods 
utilize the computer to count and calculate 
correlations among words and/or documents, 
Stovene stresses the need for ©Ktensive 
computer processing and analyses of large 
amounts of textual material in various 
subject fields similar to that done by 
Dennis in the legal field, 

MOi© recently Batty lias reviewed the 
last ten years of work in his "Automatic 
Generation of Index Languages ^ ^ He 
found that the basic statistical processes 
used in all of this work dei'ived from a 



calculation b.ised on the frequenr?y of 
terms or words and thair cooccur rencos . 
Further he stresses that, apart from 
Stiles, i?ll of the wpirk has been done 
with very snail samples and that Doyle'' 
has stressed the fantastic increase in 
costs when such techniques are applied to 
l.irgu inform£\tion files. 



The f i aid of automa tic i ndgx incj and 
classification has grown to include a 
variety of methods: (1) statistiral 
methods which use word frequency measures 
such as those which were first proposed 
separately by Luhn and Baxendale , ^ ^ { 2 ) 
positional methods which are based on the 
position of certain words in titles r or in 
key sentencesi or in a specific rolatir*^- 
ship to pi'epos i tional forms, (3) assiqi. 
ment methods which are based on a dictionary 
of included or excluded terms as well as 
key terms such as ■'summary" or "in con- 
clusion." ^ ° 



'■Mary E. Stevens. AUTOMATIC INDEXING^ A STATE-OF-THE-ART REPORT,- U.S. 
Bureau of Standards, Monograph #91, (March 30 , 1965), 290 p,; reissued with ^.ddi-- 
tions and corrections, February / 1970; th© revision cDntains over 800 new bib-- 
liographic items. 

^^S. w, Dennis. "The Design and Testing of a Fully Automatic Indexing- 
Searching System for Documents Consisting of EKpository Text*" INFORMATION 
RETRIEVAL -^^ A CRITICAL VIEW, editfd by G. Schecter, based on the Third Annual 
Colloqium on Information Retrieval, May 12-^13 , 1966, Thompson Books, (1967), 
pp. 67-94. 

^^C. David Batty. "Automatic Generation of Index Languages." JOURNAL OF 
DOCUMENTATION, 25^2 (June, 1969), pp. 142--151, P. E. Jones points out that the 
statistical methods suggested for automatic documentation are similar to those^ 
used in psycholinguis tics and content analysis in his article ^ "Historical 
Foundations of Research on Statistical Association Meth'jds for Mechanized Docu^ 
mentation," in STATISTICAL ASSOCIATION METHOD J FOR MECn\"n2ED DOrUMENTATIQN , 
edited by M. E. Stevens, V. E. Guiliano and L* B* Heilprin, Symposium proceed- 
ings, March 17-19, 19 64, U.S* Governmrnt Printing Offioe, U = S. Bureau of Stan-- 
dards / Misc^ Publication, #269 , (Deceifber 15, 1965), pp. 3--8,- In a later article 
written with M* Curtice, "A Framework for Comparing Term Association 
Measures," AMERICAN DOCUMENTATION, 1S;3 (July, 1967), pp. 153-161, he notes the 
overall similarity of such measures and their relation to the 2x2 contingency 
tables often used for two group comparisons* 

^"^H. E. Stiles. "The Association Factor in Information Retrieval," JACM, 
8 (April, 1951), pp , 271^279j and "Progress in the Use of the Association Factor 
in Information Retrieval," in PROCEEDINGS OF A SYMPOSIUM ON MATERIALS INFORMA- 
TION RETRT^iVAL, Dayton, Ohioi Air Force Materials Laboratory, November 28^29, 
1962, Technical Document ASD-TDR-63-445 , (May, 1963), pp. 143-153, 

^ ^Lauren B. Doyle. "Breaking the Cost Barrier in Automatic Classification." 
System Development Corporation, Report #Sp-2516, (July 1, 1966), 62 p. 

^^H^ P. Luhn. "A Statistical Approach to Mechanized Encoding and Searching 
of Literary Information." IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1 (1957), 
pp, 309-317. Phyllis Baxendale. "Machine-made Index f^r Technieal Literature--- 
An Lxperiment." IBM JOURNAL OP RESEARCH AND DEVELOPMENT, 4i2 (October, 1958), 
pp, 355-361, 

^^P^ Zunde summarizes a wide variety of each type of research in progress as 
he presents his own Formal Automatic Indexing of Scientific Texts (FAST) systemi 
see AUTOMATIC INDEXING OF MACHINE READABLE ABSTRACTS OF SCIENTIFIC DOCUMENTS, 
Bethesda, MD, , Docuniantation , inc., Report #AFaSR65--1425 , (September, 1965), 
213 p. J* E, Rush, R* Salvador and A, Zamora, "Automatic Abstracting and 
Indexing. II, Production of Indicative Abstracts by Application of Contextual 
Inference and Syntatic Coherence Criteria." JASIS, ( July-^August , 1971), 22s4, 
pp. 260-274, The authors summarize the previous work of H* P. Idmundson , "New 
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OBcauae thy state of the art hi^^ be^n 
Humrnari.ed by Stov^ns, Batty, Borko/and 
^unde, ^is well as other rusoarchers, this 

related to the computer and Hubjoct irdQv= 
my and offer nummarias oC those e-peri=^"' 
ments of particular rGlovance to the " 
pruEifant study. a lonq sought qoal of 
automatic indexing has b^an th© creation 
or software programs which would function 
or in a manner similnr to, the 
of the human mind. Such pro= 
vary different from the ukual 
uar^-hing, classifyinq, or in = 



as well as, 
functioning 
cGduras are 
mot hods c 



daxinu as liufah has observed- 

"The veal heart of tae matter of 
selof^tion, however, qiQs deeper ihnn 
a laq in the adoption of mechanisms 
by IjbrarieB, or a lack of dev^lop^ 
mant of devices for their use. Our 
ineptitude in getting at the record 
IS largely caused by tho artifici- 
ality of systems of indexing. When 
data of any sort are placed^in 
storage, they are filed alphabeti-^ 
cally or numerically, and information 
Is found (when it is) by tracing it 
down from subclass to subclass. 
... Th© human mind does not work 
that way. it operates by association 
With one item in its grasp, it snaps 
instantly to th© next that is sug^ 
gested by the association of thought, 
m accordance with some intricate 
web of trials carried by the cells 
C3f the brain... Man cannot hope to 
duplicate this mental procsss arti^ 
ficially, but hm certainly oug,it to 
be able to learn from it.''^! 



^ The sGQking after "somG intrieatn web" 
associations by programs was not the 
boc.nnmg. The first usd of the machine 
.i. iLdexmg was the manipulntioh of Indian 
antries^prQviously solected by human in-' 
CQ^sers. - Clapp has observed th^it th*-"» di^- 
cDvery ot punchod cards bouan a wholu^iow" 
approach to CDntent analysis i 

nt was suddenly realised, for ono 
thing, that the punched card, so far 
trom being a mere transcript of a 
book-keeper's lelgsr on a census 
form, represented a verv raspnotable 
intellectual exercise, ono involvlna 
the logic of classes . Tt now became 
possible to think of content analysis 
in terms of the intersection of 
little circles, and "co-ordinate in- 
dexing," led by the Uniterm system 
and making ^ virtue of necessity 
offered the prospect that grammar 
might now be dispensad with a-^d the 
world be analyzed into elemental or 
atomic concepts and recombined at 
Will. " ^ ^ 



^ Mortirer Taube studied existing index- 
ing systems for the Armed Forces Technical 
information Agency and then created the 

Uniterm sys cem which sought to avoidthe 
problems of heading order altogether by 
allowing word associations throughout the 
|ystjm._ Taube's system, not to be con- 
fused with later systems under similar 
names, was one of the early f ormaU^ations 
of an indexing system based on words ex- 
tracted from text,^*' " " " " 



Methods in Automatic Extractinrt ts^m i ^ n ^ ^ ^ ^ ^ ' ^ — ^ — ^ 

Edmundson and his associlte^ dev. i ^ ^ ^ ^ (April, 1969), pp. 264-285, 
Key, Title, and LocaUo"mSthoir^°^^^ selection. cue, 

occurrence ideas. The location meth^S ^.^^^hod utilizes Luhn's frequency of 
Ba.endale. The cue metHfl^s ujf ''^^^^ those of 

null words to weiiH sentences The tltl2 miJh — ^^ords , stigma words, and 
subtitle words as a meanfof addinS mff^^ u^^^ ^ glossarF^TTitle and 
words in that glossary EdLndfon%oS^ftr^''^t?^^ sentences which contain 
process of BenienaB^i^^^''^^'^J^t^ ^^l^^^^^ ""^^^^^ ^"^P"^^ the 
have experimented with mSttoSs wSS main M^t^^ Rush , Salvador , and Zamora 
mg sentences for exclusion frnm fhJ J 2 contribution lies m the area of select^ 
the WORD CONTROL^LT#SMIinin" Jn a?nhafi?^ lf f ^^^^^ ^ dictionary called" 
phrases. Each textual le^Sence^rohaS'ag^ t^P P^ words and 

semantic weight and its syntatic'vaTnS f.^^^-" ^^^tionary to find its 

retention or deletion;" "^^tatic value. The result of this process is sentence 



182 



" In\/P*nt-4 f^n= sr^A B^^ie^ r.,t ^ , pp. ai8-819, E. M. Fair 

the Future?" library journal, lxi"(1938) ."pp. 



'Indentions and Books--What of 
4 7-51. 
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"**Mor timer Taube and I. s Waeht^i '"^u^ r - ^ 
IndsKing." AMERICAN DOCUMENTAtS I'i iJ^i^l^ "''"^l^"/' Coordinate 
Approach to Bibliographic OrqanizaWon ■ A ' "Funational 

GRAPHIC OBGANIZArilN,^dital%""?"H°"lha«"nlT ^4^^°'°'^''" "^"^^ 



Chicago Press, 
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Because titles of docunifants are rela- 
tivoly easy to identify and procQss, 
various systems have bsen developad for 
using title words as substitutes for cro" 
ated subject hGadings. Ohlman is generally 
crGditod with the development of permuted 
indGKes.^" Such indoles under the namos 
"KWIC" and "KWOC" have grown to be 
standard tools in many fiulds. Questions 
still remain concerning the ar.Hquacy of 
title words in representiny the contents 
of documents. Feinberg has recently com- 
pleted a major study in this area,^^ and 
Vickery offers a summary of recvent research 
on this aspect of the question. Both 
authors conclude that unaided title ex- 
tracting methods have yone as far as they 
can go. Noninformative titles and specific , 
user needs will require human intervention 
and seleCi^^on* 



Luhn proposed that function words be 
deleted and the remaining text words be 
statistically analyzed as a means of cre- 
ating abstracts finding indexing terms. 
Fasana notes that the reactiDn to rigidly 
structured indeKing schemes with controlled 
vocabularies led to a return to the single 
word as the basic indexing unit.-- Never- 
theless/ the major problems of synonymy ^ 
ambiguity, and the lack of a syndetic 
apparatus soon forced a reassessment of 
the idea of "single word" indeKing, With 
working experience it became obvious that 
terms should net only be bound together! 
but also that various indicators of rela- 



tionship were needed. "Roles" nnd "links" 
were developed i 

"The doctrine of equal values for 
terms was not maintained^ and hier- 
archical relationships v;ere intro- 
duced. It was not posBible to 
dispense with subject authority rind 
cross-reference systems. Even the 
assumed advantage of frt-fi- coordina- 
tion and the almost unlimited possi- 
bility for the combination of terms 
turned out/ in many cases, to be 
disadvantages requiring changes in 
the structure. of headings. Terms 
had to be linked in order to prcvant 
undesirable coordination . role in-^ 
dicator^s were introduced to serve as 
standard subdivisions or modifiers 
and poly terms or 'concepts' were 
created to reduce false drops. 

With all of the problems found in con- 
ventional alphabetic subject arrangement/ 
including word order in headings/ forms 
for phrase headings, and subdivisions, it 
"still remains to be proved that any of 
the newer systems meet the needs of a large 
collBctton as well as, or any better than^ 
alphabetic subject headings."**^ Bernier 
has uritiGized text derivative indexing 
maintaining that the important things to 
be indexed are th^. concepts which the 
words in a text only symbolize.'*" Swanson 
has also noted that problems arise because 
the same concept may be expressed v^ith 
different words; he insists that the si^e 



^^H. Ohlman. "Permutation Indexing i Multiple^Entry Listings on Electronic 
Accounting Machines," System Development Corporation, (November 5, 1957), un-- 
published* Luhn and Rocketdyne Corporation were also working on this problem. 

^*Hilda Weinberg. A COMPARATIVE STUDY OF TITLE DERIVATIVE INDEXING TECH- 
NIQUES, Columbia Univariity School of Library Service, D. L. S. ThesiS/ (1972), 
390 p. 

^^^Brian C* Vickery. "Document Description and Representation." ANNUAL 
REVIEW OF INFOR^^ATION SCIENCE AND TECHNOLOGY, 5. Encyclopedia Britannica/ 
(1971)/ pp. 113-140; especially, pp. 118-119. 

'*H. P. Luhn, 0£, cit . See also the collection of his works edited by Claire 
K. Shultz. H. LUHN: PIONEER OF INFORMATION SCIENCE, SELECTED WORKS. Spartan 

Books, (1968), A,S,I,S,/ 320 p, 

^^Paul Fasana, "A Definition of IndeKing." TUTORIAL SESSIONS ON INDEXING, 
edited by B. Flood for the New York Chapter of A.D*I* Drexel Premm , DreKSl 
Library School Series, #24, pp. 1--43. 

^^Susan Artandi and Theodore C, Mines- "Roles and Links, or Forward to 
Cutter*" AMERICAN DOCUMENTATION , 14 i 1 (January; 1963), pp, 74-77. See also 
B. Montague, "Testing, Comparison, and Evaluation of Recall, Relevance, and Cost 
of Coordinate Indexing with Links and Roles." AMERICAN DOCUMENTATION/ 16 i 3 
(July, 1965)/ pp. 201-208, 

'**Bella E, Schachtman. "Subject Indexing Mythology." LIBRARY RESOURCES AND 
TECHNICAL SERVICES, 8i3 (Siimner, lt64), pp, 236-247? quote is from page 237, 

^^Charles L, Bernier. "Subject Index Production*" LIBRARY TRENDS, 16:3 
(January, 1968), pp. 389-397, See also C.L. Bernier and Evan J Crane, "Correla-- 
tive Indexes yilli Subject Indexing vs,- ^^ord Indexing," JOURNAL OF CHEMICAL 
Q DOCUMENTATION, 2s2 (April, 1962), pp. 117-122. 
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of a languagG samplQ processed for auto- 
matic indexing is a crucial problem.'*- 

Oil the other hand, Spieqal defends the 
basic assumption of automatic indeKing; 

"We assume that the informition con- 
tainQtl in a message is carried by 
the words that make it up and by the 
manner in which they are struncj to^ 
gether. Further we assumG that a 
person generating a message* , . 
chooses words in a non^random 
fashion and combines them accord^ 
ing to semantic and syntactic rules 
that are regular and, at least in 
ou- culture, predictable^ " 

Moss has also notid the crucial importance 
of words in indexing; 

"We are, in fact, always indexing 
words / and it makes no difference 
whather the words symbolize docu-^ 
ments or subjects in them . . . only 
words can be indexed , . * for a 
long time to come the only symbols 
which all of the different special^ 
ists and non-specialists have in 
common and in which there is any 
sort of agreement for indexing ^ 
storage* and retrieval are the 
words of everyday speech and their 
printed equivalents*"**^ 

Having stated that all indeKing must, 
of ]iecessity, deal with words taken from 
text or with words which verbalise teKt 



concepts, serious problems remain con- 
cern ing the relationship of wordn to the 
concQpts they represent. Any rcsiearch 
effort which utilizes words in tCMt has to 
deal with the problem of word meanings. 
Williams sununarizes the problemi 

"The ossencG of the retrieval prob- 
lem i s tha t some concepts a re ra^ 
f erred to by more than one term, 
and some term refer to more than 
one concept. Thus, the multiple 
meanings cause both false hits and 
missing true hits."'*® 

Preschel has investigated the relationship 
between conceptual afsreem.ent amoncj indoNers 
and actual term selc tion agreement among 
those same indexers,^^ she found that the 
previous measures of indexer consistency 
did not distinguish term consistency and 
concept consistency. Her studv indicates 
that while concept f reement was relatively 
high, agreement about the words to rep^ 
resent concepts was m.uch lower, 

Borko illustrates the word problem with 
the statement, "Our Host turned on the 
barbeque spit." He points out that this 
phrase may be interpreted in numerous ways' 
depending on the context.'** Wyllys dif- 
ferentiates "between "polysemic ambiguity," 
which refers to the fact that a word or 
group of words may have more than one 
meaning, and "string ambiguity," which is 
created by attaching one word to another 
as in the case of "scientific information 
handling."'*^ Weiss ^- has explored the 



Donald R. Swanson. "Research Procedures." MACHINE INDEXING! PROGRESS AND 
FROBLEMG, 0£^ cit. , pp. 281--304 ; and "Searching Natural Language Text by Com- 
puter," SCIENCE, 132 (1960), pp. 1099-1104, 

'♦'♦Joseph Spiegal and Edward Bennett. "A Modified Statistical Association 
Procedure for Automatic Document Content Analysis and Retrieval," STATISTICAL 
ASSOCIATION METHODS FOR MECHANIZED DOCUMENTATION, 0£. cit , , pp. 47-^60; quote is 
from p. 47* ~ 

"♦-R* Moss, "Minimum Vocabularies in Information Indexing." JOURNAL OF 
DOCUMENTATION, 22i2 (September, 1967), pp, 176-199; quote is from p, 183. 
James L. Dolby and Howard l, Resnikoff are also defenders of the necessity of 
studying words, see their "On the Structure of Written English Words.** 
LANGUAGE, 40 t 2 (April^June, 1964), pp , 167-196; especially p. 167. 

^^John H* Williams, Jr* "Functions of a Man-^Machine Interactive Information 
Retrieval System," JASIS, 22:5 (Septamber-^October , 1971), pp. 311^317,- quote is 
from 316* 

^Barbara Preschel. "IndeKer Consistency in Perception of Concepts and in 
Choice of Terminology." D, L, S. Study in progress, Columbia University School 
of Library Service, (19 72) , 

Harold Borko, "Automatic Indexing Process.** TUTORIAL SESSIONS ON 
INDEXING^ 0£. cit . , p, 121, 

**^R, E, Wyllys. "Extracting and Abstracting by Computer." AUTOMATED 
LANGUAGE PROCESSING, 0£. cit , , pp, 127-179; especially p. 169- 

^°S. Weiss* "Automatic Resolution, of .An^igui ties from Natural Language 
TeKt." REPORT ON ANALYSIS, DICTIONARY CONSTRUCTION, USER FEEDBACK, CLUSTERING, 
AND ON-LINE RETRIEVAL, (henceforth ISR-18), Cornell University, Report ISR^IB, 
(October, 1970), pp. IV-Sff. 
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prol^lQms of wordg v;ith multipl.Q mcrininqs 
ror the SMART syatem,^^ H© notes thnt" 
Dims dale and Lainson^'^ founds v/hen tihGy 
oxamined the v;ord "cgII/' that in a spe = 
cii.ic field such ambiquitiGS are not as 
qreat a problQm as QyipQcjled, 

Fairthornc^ contrasts the way in which a 
reader utili?^as tho words of a document 
"to find out whcit someono has soid^" v/i. th 
the v/ay in which an indoKGr unos thoBc 
saitiG v/ords to find out not only what has 
bean said but how what has been said will 
interest th© kinds of rraadQrs he serves . ^ ^ 
ThuB v;hat a docuraent is "about" dopends on 
who is rGading the v^prdg and for vvhat pur- 
pOHaa, Maron sums up the language problGm; 

"Evon when lannuage is usad primarily 
to convey information, we find that 
it is often vaguo, ambigucais , Impr'G- 
cise, changoataie^ idiomatic, and^ 
above all, exceedingly compleK . * * the 
most amazing aspect of language is 
the fact that in spite of its vague^ 
ness , ambiguity and imprecision, 
human beings are able to use it with 
success , 5 

The word problem is further complicated 
by the limitations of computers and com- 
puter programming languages. All computer 
programs can read only the "form" of a 
word. Crane and Bernier summari^.e the 
machine problems 



"Machines can not think, but thoy 
can tirelQSsly do mnny thinctB with 
qrf?at rapidity and accuracy. Ma- 
chines can not provide nonnumorical 
information which humfin beMiqs ha\'e 
no t ^ wi th f o re thouah t , du t i n to 
them,"^^ 

As eoordinata indsKiiia became en t a no led 
in the amljiguities of our lanciuaqe, othor 
ineans of automatic indoxtiig uoing beyond 
H i ng 1 e wo r ds we re s ou qli t out, Doy 1 e h a 
discussed the technical problrims of auto- 
matic classification and noted that dis- 
illusionment with single v/ord coord ina to 
indexing caused a renewed Interest in 
using the computer to generate now kinds 
of indQKes , ^ - 



Following Luhn , Maron and Kuhns pub= 
lished a paper entitled, "On Relevance, 
Probablistic Indexing and Information Re- 
trieval,"^^ which introduced probability 
tochniques into the field of automatic 
classification. ThGy us©d 405 abstracts 
to produce categories containing sets of 
manually selected keywords. Edmundson and 
Wyllys took issue with the idea of absolute 
or "raw" frequencies used by Luhn.^^ in-- 
stead they proposed a measure of word sig- 
nificance based upon the frequencies of 
words compared with the frequency of words 
in general. Their method compares fre- 
quencies of words from a particular set of 
documents with the frequencies of those 
same words in a larger corpus of words. 



^^GGrard Salton and M. E. Lesk. "Computor Evaluation of Indexinq and Text 
Processing." JACM, 15:1 (January, 1968), p. 8-^36. « ^ - 

-"B. Dimsdale and B. C. Lamson. "A Natural Language Information Retrieval 
System." PROCEEDINGS OF THE I.E.E.E., 54il2 (December, 1966), pp. 1G36-1640, 

^■Robert A. Pairthorne. "Content Analysis, Specification and Control." 
ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 4, edited by CarlOS A, 
Cuadra. Encyclopedia Britannica, (1969), pp. 73=109. See also his "Functional 
Analysis of Information Retrieval." U. S. GOVERNMENT RESEARCH AND DEVELOPMENT 
REPORTS, 69:20 (January 10, 1969) AD 677 289. 

^"^M. E. Maron. "A Logician's View of Lunguage-Data Processing." NATURAL 
LANGUAGE AND THE COMPUTER, edited by Paul L. Garvin. McGraw-Hill, (1963), pp. 
128=150; quote is from p. 136. 

^■Evan J Crane and c. L. Bernier. "Indexing and Index Searching." PUNCH 
CARDS, THEIR APPLICATION TO SCIENCE AND INDUSTRY, edited by R. S. Casey and 
J, W, Perry. Reinhold, second edition, (1958), pp. 510-^527. A recent article 
by J, Nievergelt and J. C. Farrar> "What Machines Can and Cannot Do," COMPUTiNG 
SURVEYS, 4:2, (June, 1972), pp. 81--96, points out the continuing argument about 
machine capacities including: Can a machine think?. Can a Machine' reproduce it^ 
self?, and. Are there tasks we can prove no machine will ever be able to perform? 

^ ^Lauren B. Doyle. "Is Automatic Classification a Reasonable Application of 
the Statiatical Analysis of Text?" JACM, 0£. cit . , pp. 473=489. 

-^M, E. Maron and J. L. Kuhns, "On Relevance, Probablistic Indexing and 
Information Retrieval." JACM, 7:30 (July, I960), pp. 216=244. 

--H. P. Edmundson, and R. e. Wyllys. "Automatic Abstracting and Indexing-^- 
Survey and Recommendations." JACM, 4i5 (May, 1961), pp. 226-234. 

-^Harriet R. Miiadow* "Statistical Analysis and Classification of Documents*" 
C0*4MUNICATI0NS OF THE ACM, 4:5 (1961). Here Meadow reports her computer evalu- 
ation of the Edmundson and Wyllys proposal ir IRAD task #0353 for the Federal 
Systems Office of IBM. 
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As research moved beyond singic words 
to various combinQ tions of word "sets, 
questions were immediately raised about 
the rtjlationships among words, the number 
of words to be included, and the ordering 
of words which are to appear tog other 
either in subject headings or in classifi- 
cation groups, Vickery has noted that the 
interests in word rela tionships moves us 
closer to the allied interests of clasfsi- 
JJication and subject heading rc -earch,^- 

NumerouB eKperiments have been conducted 
to determine word groupings through "associ- 
ation maps," "clumps," "clusters," and 
'^factor analysis." These experiments are 
discussed here because they bear a close 
relationship to the problems of multiterm 
or phrase headings. 

In 19 58, Tanimoto' formulated the prob= 
lem of classification in ter^ms of attri- 
butes of matrix functions.^- In 1961, 
Borko applied this procedure to a set of 
997 psychological abstracts. Later he 
expanded this procedure to deal with 
l.arger matrices in a more efficient man- 
ner. - Arnovick, Liles and Wood have re-^ 
viewed the various techniques for factor 
analysis, Difondi analyzed ninety-four 
documents into thirty-nine different fac- 



tors and then classed documonts on the 
basis of those factors. 



Parker-Rhudes, Needham, and Sparck Jones 
reject factor analysis bocause of the large 
data matrices required. Their "clumping" ~ 
method seeks out words which strongly ro- 
occur with a given text word ^*x." "The 
process differs from that proposed b% 
Maron in two ways: (1) categories are not 
mutually exclusive; word may belong to 
several cluiiips, and (2) there are no pre= 
defined limits on the Ciitegories of terms. 
Sparck Jones has elaborated a sories of 
clump types including "strings" and 
"stars. ^i^^ United ai^r tmr PaT t^ and 
Dale have applied the "clun^p" idea in their 
work at the University of Tewas » ^ ^ 

Baker has utilized the later class 
analysis theory of bazars f eld ^' ^ to create 
an automatic classification method com- 
puted on the basis of the probability that 
a document containing a certain pattern of 
keywords belongs to a certain class. 
Winter has also used a modified form of 
the same idea. Williams and ililinian use 
a multidimensional structure^ as contrasted 
with a two dimensional array, for their 
experiments with automatically generated 



6 Q 



Brian C. Vickery. "Developments in Subject Indexing." JOURNAL OF 




DUCUMENTATION, llil (March, 1955), p. 1^11 

. J^'l- T. Tanimoto. "An Elementary Mathematical Theory of Classification and 
Prediction." IBM, (1958), 10 p. 

^Harold Borko. "The Construction of an Empirically Based Ma thematicallv 
Derived Classification System," Report No. Sp^585, System Development Corpora^ 
tion, (October 26, 1961), 23 p. See also Harold Borko and M. D. Rernick' " 
"Automatic Document Classification," Technical Memo #TM-771, (November, 15, 
1962); and "Par'^ TI, Additional Experimfa.its , " Technical Memo #TM=77 l/OOl/OO 
(October 18, 1S63) , 33 p. . -- f / 

^ ^Harold Borko. "Indexing and Classification." AUTOMATED LANGUAGE PROCESS^ 
ING, 0£. cit. , pp, 99-125. 
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G. W. Arnovick, J. A. Liles, and J. S. Wood. "Information Storage and 
Retrieval.- - Analysis of the State--of -the-Art . " JOURNAL OF THE SPRING JOINT 
COMPUTER CONFERENCE. AFIPS Press, (1964), pp. 537-5S1. ^^KiNb JUINl 

^^^N. M. Difondi, -^Statistical Information Retrieval System." Griff is Air 
Force Base, New York, Rome Air Development Center, October, (1969), 56 p. 

^^A. F. Parker-Rhodes and R, M. Needham. "The Theory of c]umps." Cambridge 
England: Cambridge Language Research Unit, Report #ML 126, (February 1960) " ' 
Karen Sparck Jones and D. J. Jackson. "Current Approaches to Classification " and 
Clump-Finding at the Cambridge Language Research Unit." COMPUTER JOURNAL 10 
(May, 1967), pp. 29-37. ' ' 

■^A* G, Dale and N. Dale. "Some Clumping EKperiments for Associative Docu- 
ment Retrieval . " AMERICAN DOCUMENTATION, 16:11 (January, 1965), pp. 5^9. 

-^F, B. Baker. "Information Retrieval Based on Latent Class Analysis " 
JACM, 9i4 (October, 1962), pp. 512-^521. " ' " " 

^ ' ^l^i ^' L^zarsfeld* "Latent Structure Analysis. THE AMERICAN SOLDIER, 
edited by s. A. Stouffer. Princeton University Press, (1950), chapters 10 and 11. 

^^°W. K. Winter. "A Modified Method of Latent Class Analysis for File Organi-^ 
zation in Information Retrieval Work." JACM, 12:3 (July, 1965), pp. 356-363; 
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classification.^* The us© of discriminant 
analysis is illustrated by Williams' study 
of a small set of reference documents 
pifeviously classified by humar. indGKers, 
He derives theoretical frequencies for 
each word type on the basis of this sample 
and then caieulates both the word disper- 
Bion within each category and betv/een 
categories. The results of these calcu- 
lators are a set of weighted coefficients 
for a set of multiple discriminant func- 
tions . ^ 



Doyle utilized Ward's hierarchical 
grouping procedure to create "association 
maps" for document classification.^^ 
Later he revised his procedures to make 
the cost of processing large files more 
economic.^- Similar procedures have been 
used by stiles and Dennis. Stiles 
studied 100,000 documents indeKed by the 
Uniterm system. His strategy was "to 
generate by machine an expanded list of 
request terms that will serve as a bigger 
net to catch documents ^ ^ This strategy 



provided listings of term associations 
based on frequencies of term cooccurrence 
utilizing the 2k2 contingency tables for 
CHI -square. Meetham utilised a similar 
association measure for text analysis. ^"^ 
Oswald went beyond word pairs to create 
similar word groups of up to sIm words, 
ThG?.e high frequency cooccurring word 
groups were then used for "extracts" and 
as index tei^ms * ^ - 



Stone and Rubinoff sought word groupings 
that tended to "cluster'' in a few documents. 
Their sample of 70,000 words from 217 re- * 
views in COMPUTING REVIEWS was examined 
for technical or specialty words and then 
a Poisson probability test was utilized to 
select the most likely clusters of words. 
Bonner^ Johnson and Lafluente^ Mooers , and 
Vaswani have all reported research uti- 
lizing clusters or "lattice" structures , ^ ^ 
Minker^ Wilson, and Zimmerman have tested 
the cluster concept utilizing Salton's 
SMART ciustering system, the Augustson 
system, and the Zimn Clustering system de- 



H. Williams, Jr, DISCRIMINENT ANALYSiS FOR CONTENT CLASSIFICATION 

Betnesda, MD: IBM Corporation, (December, 1965), 272 p, d. j Hillman 

MATHEMATICAL THEORIES OF RELEVANCE WITH RESPECT TO THE PROBLEM OF INDEXING 
Lehigh university Center for Information Science, Report #2 , (1965)", 56 p'/ 

- ^A similar procedure has been utilized by W. G. Hoyle in his report 
■^Autamatic Classification and indexingi A Supplement,"" National Research 
Council of Canada, Radio and Electrical Engineering Division, ERB-793 
(November, 1968), 6 p. and appendicas. ^ ' 

6 n ]^^^^^ Doyle. "Semantic Roadmaps for Literature Searchers." JACM 
Bti (October, 1961), pp. 553-578. J. H. Ward, Jr., and M. ETHook; "Applica^ 
tion or Hierarchical Grouping Procedures to Problems of Groupincr Profiles " 
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 23 (1963), pp."' 69-82"; 

op li?^"^"" ^oylm. "Breaking the Cost Barrier in Automatie Classification," 

^-H. E. Stiles. ''The Association Factor in Information Retrieval - " JACM 
Bi2 (April, 1961), pp. 271^279. ^S. F, Dennis. "The Construction of a ' 
Thesaurus Automatieally from a Sample of TsKt. " STATISTICAL ASSOCIATION 
METHODS FOR MECHANIZED DOCUMENTATION, 0£. cit. , pp , 61-148, 

H. E. Stiles. "Machine Retrieval Using the Association Factor" MACHINE 
INDEXING! PROGRESS AND PROBLEMS, g^. cit . , pp. 192-205 rquote is frim pf^I™ 

^^A. R. Meetham. "Probabilistic Pairs and Groups of Words in Text " 
LANGUAGE AND SPEECH, 7^2 (April-June, 1964), pp. 98-106. 

^H. A. Oswald, Jr* "Automatic Indexing and Abstracting of the Contents of 
Documents. Planning Research Corporation, (October 31, 1959), prepared for 
Rome Air Development Center, RADC-TR-59^208 , pp. 5^34, and 59^133. 

L^^' Stone and M. Rubinoff. "Statistical Generation of a Technical 
Vocabulary," AMERICAN DOCUMENTATION, 19i4 (October, 1968), pp. 411^412. " 

^°m- E. Bonner, "On Some Clustering Techniques." IBM JOURNAL OF RESEARCH 
AND DEVELOPMENT, 8il (January, 1964), pp, 22^32. c, N. Mooers. "A LSatiLl 
Tneory of Language^^Symbols m Retrieval." INTSFNATIONAL CONFERENCE ON SCIEN- 
TIFIC INFORMATION, A«a #6 ReportB, (1958), pp. ^1-94. d/b. Johnson and"j. M. 
Lafuente. A Controlled Single Pass Classification Algorithm for MultiTevel 
Clustering." ISR-IS, 0£. cit, , pp. xiI-1 to XII^37. P. K. T. Vaswani: '^A 
Technique or Cluster Emphasis and its Application to AutomatiG Indexina 

Mn^^PSi??- f ^I'LiP^ CONGRESS, 60, Edinburgh, August"4^10, liearBooklat #6. 
North Holland, (19iB) , pp. 61-64, 
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veloped at the University of Maryland, 
Their tgst involved the uso of the Medlars 
system and abstracti from the IRE JOURNAL*-' 

In the development and use of all auto- 
matic procedures I there is a large element 
of human intellectual effort as both Rich- 
mond^- and Vickery* - point out* As eai ly 
as 1966, BaKendale stated that "statisti- 
cal models of association , . . which are 
restricted to frequency of occurrence data 
have reached the limits of their capacitv. 
Thus far, these models have not been abli 
to establish a warrant for meaningfully 
relating language 'usage' to frequency of 
occurrence."- - Lesk found that no choice 
of frequency or correlation cutoff point 
would yield word pairs he considered re-- 
liable. He concludes that word-word 
assoaiation measures may be valuable in 
showing relationships which are not nor- 
mally apparent and could serve as an aid 
in dictionary or thesaurus construction as 
has also been suggested by Stiles,®^ Fur- 
ther he suggests that second order associ- 
ations of words not associated with each 
other but both found in association with 
another word may be helpful in making re- 
trieval more precise*®^ 

It is evident that previous research 
discussed here has concentrated oni (l) 
analysis of language for, human or auto- 
matic indexing purposes, and (2) the cre- 
ation of indexing or search tools such as 
thesauri or association maps. Adkinson 
and Stearns, reviewing the use of the com- 
puter in the library, suggested three 
phases of automation in libraries: mecha- 
nisation of conventional proceases, auto- 
mation of search procedures based on sub- 
ject matter, and new and different kinds 
of services based on computer technology. 
They concluded that, as of 1967, efforts 
were largely stopped in phase two "because 



of the difficulty experienced bv^ computers 
in dealing with natural language and" sub- 
jective ambiguities, " * ^ 



D. CONCLUDING REMARKS ON TUB 
STATE OF THE ART 

In conclusion, the expectation that the 
computer would somehow magically remove 
human effort in indexing or find a new way 
to meaningfully represent the contents of 
documents to their potantial users haj 
floundered on the complexities and ambigui- 
ties of user needs and language* Extremely 
valuable tools such as permuted indexes 
have been developed through this research. 

Handling large information files re- 
quires human intellectual effort at several 
points. In the initial phase, the infer- 
mation file designer's efforts can provide 
file organization through classification, 
or through the creation of access points 
by indexing and an index display.- In sub- 
sequent phases ^ the intellectual efforts 
shift to the user. When eonfronted by a 
sequential file, the user must organize 
a search strategy to search that file in 
the most efficient way possible. 

The perspective of this paper is to 
give the user of an information system 
all of the help possible. Thus, this per- 
spective advocates that the information 
file designer make the effort in the 
initial phase through file organisation, 
building of indoxing vocabularies ^ and 
creating alphabetical displays in indexes 
with adequate cross references* Computers 
and coniputer programs are seen as means of 
assisting information system designers in 
these efforts. Computers have proved to 
be useful tools in such design work. A 



iSSL!?^"^^^' Gerald A. Wilson, and Barbara H, Zimmerman. EVALUATION OF 
QUERY EXPANSION BY THE ADDITION OP CLUSTIRID TERMS, FOR A DOCUMENT RETRIEVAL 
SYSTEM. University of Maryland Computer Science Center, (October^ 1971), i? p. 

^^Phyllis A. Richmond, "Transformation and Organization of Information Con- 
tenti Aspects of Recent Research in the Art and Science of Classification " 
PROCEEDINGS OP THE 31ST ANNUAL MEETING AMD CONGRESS OP THE INTERNATIONAL PEDERA- 

n?Lf-\-?™f^"^^^ ^^^^^ 13^5, Washington, DcV fearta™ke , 

(1966), volume 2, pp, 87=106; see especially p. 95, " 



p. 122 

B U 



-^^rian c. Vlckery. "Document Description and Representation," g£. cit> , 



^MKm.T^SX.^iif, S^^e^^al^- "content Analysis, Specification, and Control." 
ANNUAL REVIEW OP INFORMATION SCIENCE AND TECHNOLOGY, I, edited by C^rlos A 
Cuadra* Interscience, (1966), pp, 71-106rquote is from p. 96. 

^■M. E. L^sk, "Word-Word Aasoeiatlon in Document Retrieval ivatems 
AMERICAN DOCUMENTATION, 20 ll (January, 1969), pp. 27^38. HE. slilea? "The 
Association Factor in Information Retrieval," q£, qit. 

^^W. S. Cooper. "On Higher Level Maociation Meaaures . " JASIS, 22 2 5 
(SeptembQr-Oatober, 1971), pp. 354-3S5, 

^ ^Burton W, Adkinson and c» M, Stearni, "Libraries and Maahines--a Review " 
AMERICAN DOCUMENTATION, 18|3 (July, 1967), pp. 121-124? quote ii from p. 1247 
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number of ©Karriples are found on the fol- 
lowing pages . 



Research is continuing in the diroction 
of more efficient handling of large files, 
and more complex pattern matching programs, 
seeking to represent word conteKt by com- 
puter programs. Title and text derivativa 
methods , such as KWIC or KWOC techniques , 
have become almost standard tools as early 
alerting devices. Also in the area of 
computer--aesis ted indexing, the computer 
has proved useful for manipulating text, 
handling a variety of printing formats, 
and providing multiple access points from 
single bibliographic entries to which in- 
dexing terms have been assigned by htiman 
indeKers , 



nines, Harris, and Colverr^. have re- 
ported a system of programs for computer- 
assisted indexing developed at Columbia 
University School of Library Service. 
These programs allow "one time only" cre- 
ation of bibliographic entries in machine 
readable form with manually selected sub-- 
ject headings. This converted entry is 
machine^expanded so that author, title, 
and subject access points are created, a 
dictionary catalog output is organized 
alphab^?tically , and the index is produced 
in a pago and column format which is 
similar to the H, Wilson Company in- 
dexes* Such organization of the indax 
entries avoids double look-up during 
searching. ^ ® 



The computer has also been connected to 
a variety of display devices and utilized 
as a means of recording , storing , and re- 
trieving information such as? thesaurus 
listings, records of index term usage, and 
suggested index term candidates. Bern 
describes a system which permits a list of 
candidate terms to be displayed after they 
have been derived from textual material. 
Bennett has designed a "Negotiated Search 
Facility-' which allows an indoxer to 
uti. 4-ze a display station to search IBM 
library documents in a variety of w^iys.^^ 
Other such systems include BOLD,^- 
DIALOG,^^ AUDACIOUS, and the above men- 
tioned SMART system. Thompson has recently 
reviewed the literature of such interac- 
tive, display station^ oriented information 
systems*^ Artandi includes diseussion of 
such various computer derived or displayed 
information operations in her "Document 
Description and Representation,''^^ 



The text processing, text searching, 
and text formatting possibilities of the 
computer connected with a remote display 
device are tremendous. Most of the cal- 
culations, permutations, tables, charts 
and m&trices of the author's dissertation^ 
were made possible by the powerful remote 
terminal system of the Columbia University 
Computer Center, 



Several "higher level" programniing Ian-- 
guages allow for textual manipulation of 
large, variable length character strings.- 



- - Theodore C. Hines, Jessica L. Harris, and Martin Colverd, •*Experimenta-' 
tion with Computer^Assi sted indexing." JASIS, 21j6, (November-December 1970), 
pp, 402-405, 

®^G, M, Bern. "Description of FORMAT, a Text Processing Program," 
COMMUNICATIONS OP THE ACM, 12 (March, 1969), pp. 141-146. 

^^J. L. Bennett* ''On-line Access to Informationi NSF as an Aid to the 
Indexer/Cataloger, " AMERICAN DOCUMENTATION, 20;3 (July, 1969), pp* 213-220. 

--H, P. Burnaugh. THE BOLD USERS* MANUAL. ^ System Development Corporation, 
SDC TM-2306/004/01, (January, 1967), 

' ^^R* K, Summit. "An Operational Qn-Line Reference Retrieval System." 

PROCEEDINGB OF THE ACM NATIONAL MEETING 1967,' DCs Thompson Book Co., (1967), 
pp, 51-56. 

^^R. R, Freeman and p. Atherton, AUDACIOUS--AN EXPERIMENT WITH AN ON-LINE 
INTERACTIVE REFERENCE RETRIEVAL SYSTEM USING THE UNIVERSAL DECIMAL CLASSIFICA- 
TION AS THE INDEX LANlSUAGE IN THE FIELD OF NUCLEAR SCIENCE. American Institute 
Of Physics, UDC Project, (April, 1968), AIP/UCE =7. 

'**Ds A, Thompson, "Interface Design for an Interactive Information Re- 
trieval Systemi A Literature Survey and a Research System Description." 
JASIS, 22i6 (November- December , 1971), pp. 361-373. 

^^Susan Artandi, "Document Desori'^tion and Rapresentation . " ANNUAL REVIEW 
OF INFORMATION SCIENCE AND TECHNOLOGY, 6, edited by Carlos Cuadra* Encyclopedia 
Britannica, (1970), pp. 143^167. 

^^"Computar-aiiiitad Analysis of a Large Corpus of Curreht Educational 
Report Vocabulary, " D, L, S. diseertation , Columbia Univeroity, (1972). 

O . 



This author has found SPITBOL,^^ designed 
for high speed pattern matching and tsKt 
processing, and SN0B0L4^^ relatively easy 
to learn, conceptually understandable in a 
nonscientif ic teaching situation, and 
capable of extensive matriK and table de- 
velopment, 

A problem facing all textual processing 
for automatic indexing and abstracting, 
as well as linguistic studies, has been 
the lack of suitable machine-readable text. 
More and more index and abstract services 
are now providing magnetic tape services 
which can be utilized for this purpose. 
Once the machine-readable text problem is 
solved, the remaining problem, is storage 
space. Text processing and word correla^ 
tion demand tremendous amounts of computer 
memory. This fact accounts in part for 
the relatively small textual samples 
utilized in much research and the amount 
of study done and re-done on the same 
samples. Word matching, word frequency 
tables, lists, or matrices require that 
the computer have sufficient memory to 
store^ address, and remember the results 
of varied comparisons on a large scale 
basis. In many machine configurations 
one buys increased speed with increased 
storage; a 'factor which often does not 
make for more economical processing. 

When the computer ie used as a counting, 
extracting, and formatting tool in the 
study of language / it is a valuable assist- 
ant in the development of better controlled 
vocabulary indexing languages and indeKing 
aids such as thesauri, it is well to re- 
member that the state of the art remain 



such that the computer will still do 
exactly as it is told so that the selection 
of a language sample for study is not the 
computer's task. The computer wiirprocess 
whatever data it is given in whatever ways 
it is instructed. The human intellectual 
tasks remain; 

1) to select a sample of language rep- 
resentative of the written language of the 
persons in the particular subject field 
where indexing and abstracting are under 
consideration ; 

2) to determine the depth of indexing 
necessary to serve those persons and their 
research needs in that specific field 
bofore developing specific indexing tech- 
niques and computer processing programs 
and 

3) to plan for systematic feedback from 
the system, users to allow for continuing 
relevance in the face of changing demands. 



The demand that information retrieved 
be specifically relevant to the interests 
and purposes of the information system 
user remains at the heart of the "Infor- 
mation exchange" Which determines the suc- 
cess or failure of any information system. 
Even with remote dispiay devices, enhanced 
programming capacities, and on-line data 
files organized for rapid retrieval, the 
dynamics of information axchanga system 
will remain a very human equation that is 
surprisingly similar to the "reference 
interviews" conducted by librarians in the 
past. 



Robert 3. K, Dewar, SPITBDL 
(February 12, 1971), Version 2.1,1 
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E. Griswold, j. Poage^ and I. Pplonsky. THl SN0B0L4 PROGRAMMING 
LANGUAGE, Second edition, Prentice-Hall, (1971) , 256 p. 
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